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Abstract 

Experiments show that for many two state folders the free energy of the native state 
^Gnd{[C]) changes hnearly as the denaturant concentration [C] is varied. The slope, m — 
^^^^pp^, is nearly constant. According to the Transfer Model, the m- value is associated 
with the difference in the surface area between the native (N) and the denatured (D) state, 
which should be a function of Ai?^, the difference in the square of the radius of gyration 
between the D and N states. Single molecule experiments show that Rg of the structurally 
heterogeneous denatured state undergoes an equilibrium collapse transition as [C] decreases, 
which implies m also should be [C] -dependent. We resolve the conundrum between constant 
m- values and [C] -dependent changes in Rg using molecular simulations of a coarse-grained 
representation of protein L, and the Molecular Transfer Model, for which the equilibrium 
folding can be accurately calculated as a function of denaturant (urea) concentration. In 
agreement with experiment, we find that over a large range of denaturant concentration 
(> 3 M) the m- value is a constant, whereas under strongly renaturing conditions (< 3 M) 
it depends on [C]. The m- value is a constant above [C]> 3 M because the [C] -dependent 
changes in the surface area of the backbone groups, which make the largest contribution to m, 
is relatively narrow in the denatured state. The burial of the backbone and hydrophobic side 
chains gives rise to substantial surface area changes below [C] < 3 M, leading to collapse in the 
denatured state of protein L. Dissection of the contribution of various amino acids to the total 
surface area change with [C] shows that both the sequence context and residual structure are 
important. There are [C] -dependent variations in the surface area for chemically identical 
groups such as the backbone or Ala. Consequently, the midpoint of transition of individual 
residues vary significantly (which we call the Holtzer Effect) even though global folding can 
be described as an all-or-none transition. The collapse is specific in nature, resulting in 
the formation of compact structures with appreciable populations of native-like secondary 
structural elements. The collapse transition is driven by the loss of favorable residue- solvent 
interactions and a concomitant increase in the strength of intrapeptide interactions with 
decreasing [C]. The strength of these interactions is non- uniformly distributed throughout 
the structure of protein L. Certain secondary structure elements have stronger [C]-dependent 
interactions than others in the denatured state. 
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The folding of many small globular proteins is often modeled using the two-state approxi- 
mation in which a protein is assumed to exist in either the native (N) or the denatured (D) 
states p|. The stability of N relative to D, AGnd{0), is typically obtained by measuring 
'^Gnd{[C]) as a function of the denaturant concentration [C], and extrapolating to [C]=0 
using the linear extrapolation method (LEM) [2]. The denaturant-dependent change in na- 
tive state stability, AGatz) ([(7] ), for these globular proteins is usually a linear function of [C] 
3,S,Q,B,0,|7jIQ. Thus, AGnd{[C]) = AGND{0)+m[C], where m = dAGND{[C])/d[G] 
is a constant 5], which by convention is referred to as the m- value. However, deviations 
from linearity, especially at low [C], have also been found lOj, indicating that the m- value 
is concentration dependent. In this paper we address two inter-related questions: (1) Why 
are m-values constant for some proteins, even though there is a broad distribution of confor- 
mations in the denatured state ensemble (DSE)? (2) What is the origin of denatured state 
collapse, that is the compaction of the DSE, with decreasing [CI that is often associated 

nnn 

with non-constant m-values [10,1111,1121? 

Potential answers to_ the first question can be gleaned by considering the empirical Trans- 
fer model (TM) 



, [fsj , which has been remarkably successful in accurately predicting 
m-values for a large number of proteins 15|, ll6| . The revival in the TM as a practical tool 

ly osmolytes) comes from a series 



in analyzing the effect of denaturants (and more 



of pioneering studies by Bolen and coworkers 



15 



genera 
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17(1 . Assuming that proteins exist in 



only two states [8|, ll5|] , the TM expression for the m- value is 



k=l 



B 

k,G-k-G 



k 5 



(1) 



where the sums are over the side chain (S) and backbone (B) groups of the different amino 
acid types (Ala, Val, Gly, etc.), is the number of amino acid residues of type k in 
;he protein, and 6g^ and Sgj^ are the experimentally measured transfer free energies for k 
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13, IISI (Fig. [Bi). In Eq. [H Aaf = (a^^) - (a^^) (P = S oi B), where (a^^) 
and (af jv) average solvent accessible surface areas [19i] of group k in the D and 

states respectively, and a^Q_j^_Q is the corresponding value in the tripeptide glycine- 
s-glycine. There are two fundamentally questionable assumptions in the TM model: (1) 
The free energy of transferring a protein from water to aqueous denaturant solution at an 
arbitrary [C] may be obtained as a sum of transfer energies of individual groups of the 
protein without regard to the polymeric nature of proteins. (2) The surface area changes 
Aa^ are independent of [C], residual denatured state structure, and the amino-acid sequence 
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context in which k is found. 

The hnear variation of AGnd{[C]) as [C] changes can be rationahzed if (i) Sg^{[C]) is 
directly proportional to [C], and (ii) Aaf is [C]-independent. Experiments have shown that 
Sgj^{[C]) is a linear function of [C] [Tj while the near- independence of Aafon [C] can only be 



inferred based on the accuracy of the TM in predicting the m- values 15|, [iGj . 



contradiction to such an inference, smal 
and single molecule FRET experiments 



angle X- 
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scattering experiments 



28|, 



n an apparent 
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23| 



29| show that the denatured state 
properties, such as the radius of gyration Rg and the end-to-end distance (Ree), can change 
dramatically as a function of [C] . These observations suggest that the total solvent accessible 
surface area of the protein, AaT(= J2k=i '^^k + J2k=i ^'^f )' various groups must 

also be a function of [C], since we expect that Aax must be a monotonical 
function of Ai?^, which is the difference between Rg of the D and N states 
compact objects Aa^ Ai?^ but for fractal structures the relationship is more complex jsi ]. 



mcreasmg 
mm. For 



Furthermore, NMR measurements have found that maiw 
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proteins adopt partially structured 
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35| . which necessarily have large 



or random coil-like conformations at high [C] [3^, 
fluctuations in global properties such as Aaf and Rg. Thus, the contradiction between the 
constancy of m-values and the sometimes measurable changes in denatured state properties 
is a puzzle that requires a molecular explanation. 



Bolen and collaborators have already shown that quantitative estimates of m can be 
made by using measured transfer free energies of transfer free energies of individual groups 
[isl . IgI . More importantly, these studies established the dominant contribution to m arises 



from the backbone [15|, [IGj. However, only by characterizing the changes in the distribution 
of Aaf and Aaf as a function of [C] can the reasons for success of the TM in obtaining 
the global property m be fully appreciated. This is one of the goals of the present study. In 
addition, we correlate m with denatured state collapse, [C]-dependent changes in residual 
structure, and the solution forces acting on the denatured state - properties that cannot be 
analyzed using the TM. 

The denatured, and perhaps even the native state should be described as ensembles of 
fluctuating conformations, and will be referred to as the DSE and NSE (native state en- 
semble), respectively. As a result, it is crucial to characterize the distribution of various 
molecular properties in these ensembles and how they change with [C] in order to describe 
quantitatively the properties of the DSE. Because the D state is an ensemble of confor- 
mations with a distribution of accessible surface areas, Eq. [T] should be considered an 
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approximate expression for the m-value. Even if the basic premise of the TM is vahd, we 
expect that Aa^ should depend on the conformation of the protein and the denaturant 
concentration. Consequently, the m-value should be written with an explicit concentration 
dependence as 

^([^]) = {{aUlC])) - {alAim 

iKniiC])) - {aUim (2) 

where {oij^ji[C])) = aj^jP{a^j; [C])dak,p (j = D or N and P = S or B). In principle, 
the denominator in Eq. [2] should also be [C] -dependent, however, we ignore this for simplic- 
ity. In contrast to Eq. [T], the conformational fluctuations in the DSE and NSE are taken 
into account in Eq. [2] by integrating over the distribution of surface areas {P{a^p [C*]))- 
Moreover, we do not assume that the surface area distributions are independent of [C] as is 
done in Eq. [T] Such an assumption can only be justified by evaluating P{a^p [C]) using 
molecular simulations or experiments. 



We use the Molecular Transfer Model (MTM) 36|] in conjunction with coarse-grained 
simulations of protein L using the Ca side chain model (Co-SCM) (see Methods) to test the 
molecular origin of the constancy of m-values. Because the conformations and energies are 
known exactly in the C^-SCM simulations, we can determine how an ensemble of denatured 
conformations, with a distribution of solvent accessible areas in the DSE, gives rise to a 
constant m-value. We show that the m-values are nearly constant for two reasons: (1) As 



previously shown [15|, |l6j, the bulk of the contribution to AGnd{[C]) changes come from 
the protein backbone. (2) Here, we establish that the distribution of the backbone solvent 
accessible surface area is narrow, with small changes in Aaf as [C] decreases. 

Determination of the molecular origin of denatured state collapse, often associated with 
a concentration dependent m-value, requires characterizing the DSE of protein L at low [C] 
(< 3 M urea) where the NSE is thermodynamically favored. Under these conditions we 
find that the radius of gyration {Rg) DSE undergoes significant reduction as [C] decreases. 
Urea-induced collapse transition of protein L is continuous as a function of [C], and results 
in native-like secondary structural elements. We decompose the non-bonded energy into 
residue-solvent and intrapeptide interactions and show that (1) these two opposing energies 
govern the behavior of Rg of the DSE, and (2) the strength of these interactions are 
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non-uniformly distributed in the DSE and correlate with regions of residual structure. 
Thus, different regions of the DSE can collapse to varying degrees as [C] changes. 



Methods: 

Ca-side chain model for protein L: In order to ascertain the conditions under which Eq. 
[T] is a goo d approximation to Eq. [21 we use the coarse-grained Ca-side chain model {Ca- 
SCM) 371] to represent the sixty-four residue protein L. In the Cq-SCM, each residue in 
the polypeptide chain is represented using two interaction sites, one that is centered on 
the a-carbon atom and another at the center-of-mass of the side chain [stI. The potential 
energy (Ep) of a given conformation of the Cq-SCM is a sum of bond-angle (Ea), backbone 
dihedral (Ed), improper dihedral (Ef), backbone hydrogen bonding (Ehb) and non-bonded 
Lennard- Jones (Eu) terms {Ep = Ea + Ed + Eq + Ehb + Elj)- The functional form of these 
terms, and derivation of the parameters used are explained in the supporting information 
of reference [36 1. 

Sequence information is included in the Ca-SCM by using non-bonded parameters that 
are residue dependent. We take into account the size of a side chain by varying the collision 
diameter used in the Eu term. The interaction strength between side chains i and j, that 
are in contact in the native structure, depends on the amino acid pair and is modeled by 



varying the well-depth (ej^) in E^j 



Thus, the Cq-SCM incorporates both sequence 



variation and packing effects. Numerous studies have shown that considerable insights into 
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39 



thus rationalizing 



protein folding can be obtained using coarse-grained models 
the choice of the Cq-SCM in this study. 

Simulation details: Equilibrium simulations of the folding and unfolding reaction using 
the Cq-SCM are performed using Multiplexed-Replica Exchange (MREX) 



41 



42 in con- 



junction with low friction Langevin dynamics 43|] at [C]=0. We used CHARMM to carry 
out the Langevin dynamics [4^ , while an in-house script handles the replica exchange calcu- 
lation. In the MREX simulations, multiple independent trajectories are generated at several 
temperatures. In addition to the conventional replica exchange acceptance/rejection criteria 
for swapping conformations between different temperatures 4l|], MREX also allows exchange 
between replicas at the same temperature 42|. Replicas were run at eight temperatures: 



315, 335, 350, 355, 360, 365, 380, 400 K. At each temperature four independent trajectories 
were simultaneously simulated. Every 5,000 integration time-steps the system configurations 
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were saved for analysis. Random shuffling occurred between replicas at the same temper- 
ature with 50% probability. Exchanges between neighboring temperatures were attempted 
using the standard replica exchange acceptance criteria [41|. A Langevin damping coeffi- 
cient of 1.0 ps^^ was used, with a 5 fs integration time-step. In all, 90,000 exchanges were 
attempted, of which the first 10,000 discarded to allow for equilibration. All trajectories 
were simulated in the canonical (NVT) ensemble. 

Analysis with the Molecular Transfer Model: We model the denaturation of protein L 
by urea using the Molecular Transfer Model 13]. Previous work [sgI has already shown 
that the MTM quantitatively reproduces experimentally measured single molecule FRET 



efficiencies 



27 



28|, 



291] as a function of [C] (GdmCl) for protein L and the cold shock protein. 



ihus validating the methodology. The MTM combines simulations at [C]=0 with the TM 
30], experimentally measured transfer free energies Q, |l6], and a re weighting method to 



45, 



46, 



471 ] . Our previous 



predict protein properties at any urea concentration of interest [36|, 
work has shown that the MTM accurately predicts a number of molecular characteristics of 
proteins as a function of denaturant or osmolyte concentration [36]. The MTM equation. 



which has the form of the Weighted Histogram Analysis Method [46] , is 

mc],T)) 



1=1 t=l 



V n pfn-PnEp{i,t,[o]) 



(3) 



where {A{[C],T)) is the average of a protein property A at urea concentration [C] and 
temperature T, and Z{[C],T) is the partition function. The sums in Eq. [3] are over the R 
different replicas from the MREX simulations, that vary in terms of temperature, and ni 
protein conformations from the Z*'' replica. The value of A from replica / at time t is Ai^t, 
and Ep{l,t, [0]) is the potential energy of that conformation at [C]=0, (3 = l/lksT), where 
ks is Boltzmann's constant. In Eq. [3l AGtril,t, [C])), the reversible work of transferring 
the l,t protein conformation from M to [C] M urea solution, is estimated using a form of 
the TM, and is given by 

Ns 

AGUl,t,[C])) = Yl 



^^'^'^^^\aUl,t, m + E m. (4) 



k=l 



a 



s 

G-k-G 



k=l 



a 



B 

G~k-G 



All terms in Eq. H] are the same as in Eq. [2] except instead of computing a difference in 
surface areas, only the surface areas from conformation l,t {{a^{l,t, [C]))) are included. In 
the denominator of Eq. [3], the sum is over the different replicas and n„, /5„ and are, 
respectively, the number of conformations from replica n, f3rn = ^/{kBTm) where is the 
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temperature of m*^ replica, and the free energy fm of replica m is obtained by solving a 
self-consistent equation (see reference |45| ) . 

In computing (a^(/,t, [C])) for use in Eq. IHwe use the radii hsted in Table [T] where the 
backbone group corresponds to the glycine. These parameters are different from the ones 
reported in [3Q]. They result in better agreement between predicted m- values using the 



MTM and predicted m- values from Auton and Bolen's implementation of the TM 15|, Il8 |. 
The values for aQ_f._Q, used in Eq. HI are reported in Table ITTl 

We calculate the average of a number of properties of protein L using Eq. [31 The 
end-to-end distance (Ree) of a given conformation is the distance between the Ca sites 
at residues one and sixty-four. The radius of gyration, Rg, is computed using ^/R^ = 
2N-No (S^=r^'^(^»~''^c'M)^), where is the number of residues, Nq is the number of glycines 
in the sequence, rj is the position of interaction site i, and rcM = 1/(2A^ — Nq) Yl'i=-\~'^° 
is the mean position of the 2N — Nq interaction sites of the protein. The solvent accessible 



surface area of a backbone or a side chain 



a^) in residue in a given conformation was 
computed using the CHARMM program 4^, which computes the analytic solution for the 
surface area. A probe radius of 1.4 A, equivalent to the size of a water molecule, was used. 

The extent to which a structural element is formed (denoted fs) in a conformation of 
protein L is defined by Qp, the fraction of native backbone contacts formed by structural 
element p, where p = /3-hairpin 5*12 or 5*34, or /?-strand pairing between 5*1 and 5*4. We 
define Qp as 

j k=j+4: P 

where the sum is over the A^ = 64 Cq, sites, Rc{= 8 A) is a cutoff distance, and djk is the 
distance between interaction sites j and k, and Q{Rc — djk) is the Heaviside step function. 
Strand 1 (51) corresponds to residues 4-11, 5*2 between 17-24, 5*3 corresponds to 47-52, 
and 5*4 between 57-62 (Fig. Wp)- In Eq- El Cp is the maximum number of native contacts 
for structural element p. The extent of helix formation in a conformation r of protein L 
is computed as the ratio N^{r)/N^{N)^ where N^{r) is the number of neighboring dihedral 
pairs, between residues 26 and 44, that have dihedral angles within ±20° of the dihedral's 
value in the native state, and N^{N) = 15. 

The non-bonded interaction energy Ej in the Ca-SCM is Ej = Ejj + Ehb- We include 
only the Lennard- Jones (LJ) and hydrogen bond (HB) energies in Ej 36|]. The urea solvation 
energy, Es, of a given conformation is set equal to Eq. HI i^M is a simple sum of Ei and Es- 
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The values of Ej and Es for the various structural elements of protein L were computed by 
neglecting non-bonded and solvation energies of residues that were not part of the structural 
element of interest. 

The time-series of the various properties were inserted into Eq. [3] to compute their 
averages as a function of [C]. To compute averages (Ad) and (An) of the DSE and NSE 
respectively, a modification to Eq. [3] was made. The numerator was multiplied by G„(/,t), 
where Qn{h t) is the Heaviside step function that is equal to 0(5 — A(Z, t)) when the average 
of the NSE is computed (i.e. n =NSE) and is equal to 0(5 + A(/, t)) when the average of the 
DSE is computed (i.e. n =DSE). Here, A(/, t) is the root mean squared deviation between the 
Ca carbon sites in the Cq-SCM of conformation I, t and the Ca carbon atoms in the crystal 



structure (PDB ID 1HZ6 [48(). When A{l,t) is greater than 5 A then 0(5 + A(/,t)) = 
and 0(5 - A(/,t)) = 1, and when A{l,t) is less than 5 A then 0(5 + A(/,t)) = 1 and 
0(5- A(/,t)) = 0. 

Probability distributions were computed using P{A±6a',[C]) = Z{A ± 
6a, [C],T)/Z{[C],T), where Z{A±6a, [C],T) is the restricted partition function as a function 
of A. Due to the discrete nature of the simulation data, a bin with finite width ±5^, whose 

value depends on A, is used. Z{A ± 6a, [C\,T) = Li=i Lt=i J^u „ ,f„-p„Epii,t,m > 

where all terms are the same as in Eq. [3] except for fA{^,t), which is a function that we 
define to equal 1 when the protein conformation /, t has a value of A in the range of A ± 6a, 
and zero otherwise. 

Results and Discussion 

AGi\f£){[C]) changes linearly as urea concentration increases: We chose the 



experimentally well characterized Bl IgG binding domain of protein L [27|, |28|, |49| to 
illustrate the general principles that explain the linear dependence of AGndHC]) on [C] 
for proteins that fold in an apparent two-state manner. In our earlier study [36], we 
showed that the MTM accurately reproduces several experimental measurements including 
[C]-dependent energy transfer as a function of guanidinium chloride (GdmCl) concentration. 
Prompted by the success of the MTM, we now explore urea- induced unfolding of protein L. 
The MTM predictions for urea effects are expected to be more accurate than for GdmCl, 
since the experimentally measured 6g^{[C]) urea data, used in Eq. [H includes activity 
coefficient corrections while the GdmCl data does not The calculated AGno{[C]) 
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as a function of urea concentration for protein L shows linear dependence above [C] > 4 
M (Fig. [It) with m = 0.80 kcal mol"^ M-\ and a Cm (obtained using AGNDi[Cm\) = 0) 
^ 6.6 M. The consequences of the deviation from hnearity, which is observed for [C] < 3 
M, are explored below. It should be stressed that the error in the estimated AGnd{[0]) 
is relatively small (~0.8 kcal mol~^) if measurements at [C] > 4 M are extrapolated to 
[C]= (Fig. [TJd). Thus, from the perspective of free energy changes the assumption that 
AGndHC]) = AGndUO]) +m[C], with constant m, is justified for this protein. 

Molecular origin of constant m-values 

Inspection of Eq. [2] suggests that there are three possibilities that can explain the 
constancy of m-values, thus making Eq. [1] a good approximation to Eq. [2j (1) Both 
(af j)([C])) and («fc7v([^])) -^l- El have the same dependence on [C], making Aaf 
effectively independent of [C]. (2) The distributions P{a^^; [G]) in Eq. [2] are sharply 
peaked about their mean or most probable values of (yj^ oii'^]) [C]) thus making Aa^ 

independent of [C]. In particular, if the standard deviation in (denoted a^i^) is much less 
than («fc^£i([C'])) for all [C]'s then the Aaf 's would be effectively independent of [C]. (3) One 
group in the protein, denoted / (backbone in proteins), makes the dominant contribution to 
the m-value. In this case, only the changes in Aaf and P{af^; [G]) matter, thereby making 
Aaf insensitive to [C]. The MTM simulations of protein L allow us to test the validity 
of these plausible explanations for the constancy of m-values, especially when [C]> 3 M 
(Fig. [T)d). Only by examining these possibilities, which requires changes in the distri- 
bution of various properties as [C] changes, can the observed constancy of m be rationalized. 

(af^([C])) and (afAr([C'])) do not have the same dependence on [C]: The 

changes in (af £,) and («f tv) a function of [C] show that as [C] increases, both («f £>) 
and (a^^) increase (blue and green lines in Fig. However, (afc^£i([C])) has a stronger 
dependence on [C] than (Q;f^([C])) for both the backbone and side chains (Fig. [2^). Thus, 
the observed linear dependence of AGno{[C]) on [C] cannot be rationalized in terms of 
similarity in the variation of (af£,([C])) and {o:^jm{[G])) as [C] changes. The stronger 
dependence of (afc^£)([C'])) on [C] arises from the greater range and magnitude of the solvent 
accessible surface areas available to the DSE (see below). The greater range allows larger 
shifts in («fc^£i([C*])) than (afc^Ar([C])) with [C]. Equally important, the strength of the 
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favorable protein-solvent interactions is positively correlated with the magnitude of the 
surface area and [C] (see Eq. [Hand Fig. W^). Thus, the DSE conformations with larger 
surface area are stabilized to a greater extent than the NSE conformations with increasing 
[C] and subsequently (af£)([C'])) shows a stronger dependence on [C]. 

Surface area distributions are broad in the DSE: The variation of Aaf^ and 
Aa^^ with [C] suggests that the P{a^jj; [C]) are not likely to be narrowly peaked, and 
must also depend on [C] (Eq. [2]). As urea concentration increases, the total backbone 
surface area distribution in the DSE, -P(a£; [C]), shifts towards higher values of and 
becomes narrower (Fig. [3^). A similar behavior is observed in the distribution of the total 
surface area (Fig. [3]d) and for the side chain groups (data not shown). It should be noted 
that the change in a£ with [C] is about five times smaller than the corresponding change 
in ax (compare Figs. [3^ andlSb). Thus, the distribution of surface areas for the various 
protein components are moderately dependent on [C], and Aax is more strongly dependent 
on [C] (Fig. [2t inset). These findings would suggest that m should be a function of [C] 
above 4 M (Eq. [2]), in contradiction to the finding in Fig. [T)d. 

We characterize the width of the denatured state P{oi^r)) distributions by computing 
the ratio pk = CTa^^^/ {a^j^) , where a^^^^ = (aj^jj^) - (a^^,)^- Fig. Ha shows pk as a 
function of [C] for the various protein components (backbone, side chains, and the entire 
protein). As with the backbone P{a^) distribution (Fig. [3^), p^ indicates that P {0:^,0) 
becomes narrower at higher urea concentrations for most k (Fig. H^). At 8 M urea, the 
width of P{a^ j^) ranges from 5 to 25 % of the average value of ^ for all groups, except 
k = Trp which has an even larger width. Clearly, p^ is large at all [C], which accounts 
for the dependence of Aa^ on [C]. The results in Fig. S] show that there are discernible 
changes in which reflects the variations in P{a^ ^; [C]) as [C] is changed. Consequently, 
the constancy of the m-value cannot be explained by narrow surface area distributions. 

The weak dependence of changes in accessible surface area of the protein 
backbone on [C] controls the linear behavior of AGj^dHC])'- Plots of at 
several urea concentrations for the entire protein, the backbone groups (second term in Eqs. 
[Hand [2]), and the hydrophobic side chains Phe, Leu, He, and Ala are shown in Fig. |Dd. The 
slope of these plots is the m-value, which in the transition region {i.e. from 5.1 M to 7.9 M 
urea) is 0.80 kcal mol~^ M^^ for the entire protein. The contribution from the backbone 
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alone is 0.76 kcal mol~^ M~^, and from the most prominent hydrophobic side chains (Phe, 
Leu, He, and Ala) is a combined 0.04 kcal mol~^ M~^. Thus, the largest contribution to 
the change in the native state stability, as [C] is varied, comes from the burial or exposure 
of the protein backbone (95%). The simulations directly support the previous finding that 
the protein backbone contributes the most to the stability changes with [C] {3]. Thus, for 
[C] > 3 M the magnitude of the m-value is largely determined by the backbone groups. 
However, only by evaluating the [C]-dependent changes in the distribution of surface areas 
can one assess the extent to which Eq. [2] be approximated by Eq. [H 

The relative change in accessible surface area of the backbone Aa^ has a relatively weak 
urea dependence between 4 M to 8 M urea, increasing by only 75 (Fig. Such a small 
change in Aa^ with [C] has a negligible effect on the m-value. These results show that m 
is effectively independent of [C] in the transition region because Aaf ([C]) associated with 
the backbone groups change by only a small amount as [C] changes, despite the fact that 
Aarp can change appreciably (Aa;x'(4M —>■ 8M) ^ 300 A^ Fig. ^ inset). Thus, the third 
possibility is correct, namely that the weak dependence of Aaf ([C]) on [C] results in m 
being constant. 



Residual denatured state structure leads to the inequivalence of amino 
acids: In applying Eq. [T] to predict m-values, it is assumed that all residues of type k, 
regardless of their sequence context, have the same solvent accessible surface area in the 
DSE QQ- 

Our simulations show that this assumption is incorrect. Comparison of ctj^j^ 
for individual residues of type k, and the average (af £>) as a function of urea concentration 
(Fig. [2^) shows that both sequence context and the distribution of conformations in the 
DSE determine the behavior of a specific residue. Large differences between p, values are 
observed between residues of the same type, including alanine, phenylalanine and glutamate 
groups, even at high urea concentrations (Fig. [2^). The inequivalence of a specific residue 
in the DSE is similar to NMR chemical shifts that are determined by the local environment. 
As a result of variations in the local environment not all alanines in a protein are equivalent. 
Thus, ignoring the unique surface area behavior of individual residues in the DSE could 
lead to errors in the predicted m-value. Because the backbone dominates the transfer free 
energy of the protein (Fig. Wp), errors arising from this assumption may be small. However, 
the dispersions in the backbone ^ suggests that different regions of the protein may 
collapse in the DSE at different urea concentrations, driven by differences in Aa^^ from 
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residue to residue (see below). 

The simulations can be used to calculate [C]-dependent changes in surface areas of the 
individual backbone groups as well as side chains. Interestingly, even for the chemically ho- 
mogeneous backbone group, significant dispersion about (af £>) is observed when individual 
residues are considered (Fig. [2^). For example, af^, for residue 10 changes more drastically 
as [C] decreases than it does for residues 20 or 50. Thus, the connectivity of the backbone 
group can not only alter the conformations as [C] is varied but also the contribution to the 
free energy. 

Even more surprisingly, the changes in (yk=AiaD depends on the sequence location of a 
given alanine residue and the associated secondary structure adopted in the native confor- 
mation. The changes in af^AiaD residues 8 and 20, both of which adopt a /?-strand 
conformation in the native structure (Fig. [2]d), exhibit similar changes upon a decrease in 
[C] (Fig. [2^). By comparison, surface area changes in alanine residues 29 and 33, that are 
helical in the native state (Fig. ^jp), are similar as [C] varies, while the changes in a^^^iaD 
for alanines that are in the loops (residues 13 and 63) are relatively small. Examining the 
probability distribution of surface areas for the individual alanines {P{c(AiaD) -^^S- El), 
which is related to the average surface area and higher order moments, a wide variabil- 
ity between different residues is observed. Similar conclusions can be drawn by analyzing 
the results for the larger hydrophobic residue Phe and the charged Glu (Fig. [2^). Thus, 
for a given amino acid type, both sequence context as well as the heterogeneous nature 
of structures in the DSE lead to a dispersion about the average (af £>) and higher order 
moments of P{a^ ^) as urea concentration changes. Much like the chemical shifts in NMR, 
the distribution functions of chemically identical individual residues bear signatures of their 
environment and the local structures they adopt as [C] is varied! 

The total surface area difference between and D (Aaj-) changes by about 1,200 
as [C] decreases from 8 M to M (see inset of Fig. [2t). Decomposition of Aa^ into 
contributions from backbone and side chains (Eqs. [T] and [2]) shows that the burial of the 
backbone groups contributes the most (up to 38%) to Aax (Fig. Ulp)- Not unexpectedly, 
hydrophobic residues (Phe, He, Ala, Leu), which are buried in the native structure, also 
contribute significantly to Aax, which supports the recent all atom molecular dynamics 



simulations 



50|. Among them, Phe, a bulky hydrophobic residue, makes the largest side 



chain contribution to Aa^ (Fig. [St). For example, as urea concentration increases from 4 M 
to 8 M the total backbone Aa^ increases by 75 A^, and n^Aaf^^ for k =Phe, Leu, Ala, lie 
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increase by 21-42 A^. 

The dispersion in af^ could be caused by residual structure in the DSE [5l|, |52|. We 
test this proposal quantitatively by plotting af ^j/af^^ for each residue, where af^^ is 
the maximum ctj^ j-, value for residue type in 8 M urea. If residual structure causes the 
dispersion in then we expect that af^£)/'^f,z) should depend on the secondary structure 
element that residue k adopts in the native state. We find that there is a correlation 
between ^ and the helical secondary structure element (residues 26 to 44, Fig. [6]). The 
helical region tends to have smaller af/j/af/f values compared to other regions of the 
protein. Of the nine alanines in protein L, four are found in the helical region of the protein. 
These four residues have some of the smallest af^^/af^ values out of the nine alanines. 
The [C]-dependent fraction of residual secondary structure in the DSE shows that at 8 M 
urea the helical content is 32% of its value in the native state (Fig. [7^). Taken together, 
these data show that ^ depends not only on the residue type, but also on the residual 
structure present in the DSE, which at all values of [C], is determined by the polymeric 
nature of proteins. 



Residue-dependent variations in the transition midpoint - The Holtzer Ef- 
fect: Globally, the denaturant-induced unfolding of protein L may be described using the 
two state model (Fig. Wp)- However, deviations from an all-or-none transition can be 
discerned if the residue-dependent transitions Cm,i can be measured. For strict two-state 



behavior, C, 



m,i 



Cm for all i, where Cm,i is the urea concentration below which the i 



th 



residue adopts its native conformation. The inequivalence of the amino acids, described 
above (Fig[2ti'), should lead to a dispersion in Cm,i- The values of Cm,i are determined by 



specific interactions, while the dispersion in C„i,i is a finite-size effect |53l . |54| . In other 
words, because the number of amino acids {N) in a protein is finite, all thermodynamic 
transitions are rounded instead of being infinitely sharp. Finite-size effects on phase 
transitions have been systematically studied in spin systems [55| but have received much 
less attention in biopolymer folding 5J]. Klimov and Thirumalai 53| showed that the 
dispersion in the residue-dependent melting temperatures Tm,j, denoted AT (AC), for 
temperature (denaturant) induced unfolding scales as AT/T^ ~ 1/A^ {AC /Cm ~ V^)- 
The expected dispersion in Cm,i or Tm^i is the Holtzer effect. 

In the context of proteins, Holtzer and coworkers 56|] were the first to observe that 
although globally thermal folding of the 33-residue GCN4-lzK peptides can be described 



15 



using the two state model, there is dispersion in the melting temperature throughout the 
protein's structure. In accord with expectations based on the finite size of GCN4-lzK, it 
was found, using one-dimensional NMR experiments, that Tm,i depends on the sequence 
position. The deviation of Tm,i from the global melting temperature is as large as 20% |56|. 
More recently, large deviations in Tm,i from Tm have been observed for other proteins |57l |. 

We have determined, for protein L, the values of Cm,i using Qi{Cm,i) = 0.5, where Qi 
is the fraction of native contacts for the i^^ residue. The distribution of Cm,i show the 
expected dispersion (Fig. [8^), which implies different residues can order at different values 
of [C]. The precise Cm,i values are dependent on the extent of residual structure adopted 
by the i^'^ residue, which will clearly depend on the protein. Similarly, the distribution of 
the melting temperature of individual residues Tm,i, calculated using Qi(Tm,i) = 0.5, also 
show variations from T^. However, the width of the thermal dispersion is narrower then 
obtained from denaturant-induced unfolding (Fig. [8]d). This result is in accord with the 
general observation that thermal melting is more cooperative than denaturant-induced 
unfolding 58] . It should be emphasized that the Holtzer effect is fairly general, and only as 
N increases will AC and AT decrease. 



Specific protein collapse at low [C], and the balance between solvation 
and intraprotein interaction energies: As [C] is decreased below 3 M there is a 
deviation in linearity of AGn£){[C]) (Fig. [TJo) and the m- value depends on [C]. At low 
[C] values the characteristics of the denatured state change significantly relative to the 
denatured state at 8 M. The radius of gyration and Aut change by up to 6 A (Fig. 
E]) and 1,150 (Fig. ^) respectively, indicating that the denatured state undergoes a 
collapse transition. We detail the consequences of the [C]-dependent changes and examine 
the nature and origin of the collapse transition. 

Surface area changes: Above 4 M urea, the a^^D values change only modestly (Fig. [2^). 
However, below 4 M much larger changes in a^^D occur (Fig. [2^). In particular, Aa^ 
decreases by 850 going from [C]=4 M to [C]=0 M urea, compared to ^300 A^ upon 
decreasing [C] from 8 M to 4 M urea (Fig. ^ inset). The backbone is the single greatest 
contributor to Aa^, accounting for 24% to 38% of Aax at various [C]. Thus, a significant 
amount of backbone surface area in the DSE is buried from solvent as [C] is decreased, and 
the protein becomes compact (Fig. The next largest contribution to Aa^, as measured 
by nkAak{= nk{{ak,D{[C])) — {(^k,N{[C])))), arises from the hydrophobic residues Phe, He, 
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and Ala (Fig. These residues also exhibit relatively large changes in the DSE surface 
area as [C] is decreased. The large change in surface area of Phe as [C] decreases shows that 
dispersion interactions also contribute to the energetics of folding SOj. On the other hand, 
for side chains that are solvent exposed in the native state, such as the charged residue Asp, 
UkAak is small and does not change significantly with [C] (Fig. The results in Fig. [2l 
and the surface area dependence of the TM, suggests that the changes in surface area at low 
[C] are related to changes in solvation energy of the backbone (see below). 

Rg and Ree changes: Decreasing [C] below 4 M leads to a R^ change of up to 4 A, and 
an end-to-end distance (Ree) change of up to 10 A (Fig. [9]). Such a large change in R^ 
shows that a collapse transition occurs in the DSE. We find no evidence (e.g. a sigmoidal 
transition in i?^ versus [C]) that the DSE at M {{Rf) = 15.5 A) and the DSE at 8 
M urea {{Rf) = 21.5 A) are distinct thermodynamic states. This suggests that the urea- 
induced DSE undergoes a continuous second order collapse transition as urea concentration 
decreases. 

Residual structure changes: To gain insight into secondary structure changes that occur 
during the collapse transition we plot the residual secondary structure (fs) iii the DSE 
versus [C] (Fig. Uh)- Above 4 M urea only /?-hairpin 3-4 and the helix are formed to any 
appreciable extent. However, below 4 M /3-hairpin 1-2 and /3-sheet interactions between 
strands 1 and 4 can be found in the DSE. For example, at 1 M urea /3-hairpin 1-2 and 
strands 1 and 4 are formed 21% and 16% of the time, while there is 56% helical and 74% 
/3-hairpin 3-4 content in the DSE (Fig. [7^). Thus, as [C] is decreased, the residual structure 
in the DSE increases, contributing to changes in Rg, R^e, and the surface areas. This finding 
suggests that the collapse transition is specific in nature, leading to compact structures with 
native-like secondary structure elements. 

Solvation versus intraprotein interactions: Neglecting changes in protein conformational 
entropy, two opposing energies control the [C]-dependent behavior of i?^; the interaction of 
the peptide residues with solvent (the solvation energy, denoted Es), and the intraprotein 
non-bonded interactions between the residues (denoted Ej). For denaturants, such as urea, 
Es favors an increase in R^ and a concomitant increase in solvent accessible surface area, 
while Ej typically is attractive and hence favors a decrease in R^. Because Es in the TM 
model is proportional to a surface area term, and Ej is likely to be approximately propor- 
tional to the number of residues in contact (which increases as the residue density increases 
upon collapse), we expect Es{[C]) oc -[C]{Rf {[C]))^ and Ei{[C]) oc -1 / {R^ {[C]))^ . The 
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behavior of these two functions (increasing {R^{\C])) leads to a more favorable Es{\C]) 
and unfavorable Ei{[C])) suggests that there should always be some contraction (expan- 
sion) of the DSE with decreasing (increasing) [C]. The molecular details in the Ca-SCM 
allow us to exactly determine Es{\C]) and Ei{\C]) as a function of [C], and thereby get an 
understanding of the energy scales involved in the specific collapse of the DSE. 

In the inset of Fig. Eb we plot Es{\C]), ^/([C]), and Em{\C]){= Es{\C]) + Ei{\C])) 
in the DSE. As indicated by the Flory-like argument given above, Es{[C]) becomes more 
favorable with increasing [C], and Ei{[C]) becomes more unfavorable with increasing [C] 
(Fig. Wp Inset). The behavior of Em{\C]) is important to examine, as this quantity governs 
the behavior of R^{[C]). Above 4 M, Em{[C]) is relatively constant, varying by no more than 
1 kcal/mol. This finding is consistent with the small changes in i?^, i?ee, and Aa^ above 
4 M urea (Figs. [9]and[2]3). Below 4 M, the Em{\C]) strength increases and is dominated 
by the attractive intrapeptide interactions (i?/([C])) at low [C] (Fig. [7)d Inset), driving the 
collapse of the protein as measured by Rg. 

We dissect the monomer interaction energies further by computing the average monomer 
interaction energy per secondary structural element (Fig. Wp)- Above 4 M urea, the 
monomer interaction energies change by less than 0.4 /c^T, except for the /3-hairpin 3-4 
which changes by as much as ~ 0.9 ksT. Below 4 M the monomer interaction energies 
change by as much as 1.5 A;_bT, with the helix exhibiting the smallest change with [C]. 
These findings, which are in accord with changes in residual secondary structure (Fig. Wp), 
indicate that the magnitude of the driving forces for specific collapse (defined as ^^^j^^) 
are (from greatest to least) associated with /3-hairpin 3-4 > /5-strands 1-4 > /3-hairpin 1-2 
> helix. Thus, the forces driving collapse are non-uniformly distributed throughout the 
native state topology. 



Concluding remarks 



The major findings in this paper reconcile the two-state interpretation of denaturant 
m-values with the broad ensemble of conformations in the unfolded state, and resolves 
an apparent conundrum between protein collapse and the linear variation of /S.Gnd{[C]) 
with [C]. The success of the TM model in estimating m-values 15|, ll6| suggests that the 
free energy of the protein can be decomposed into a sum of independent transfer energies 
of backbone and side chain groups (Eq. [T]). However, in order to connect the measured 
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m-values to the heterogeneity in the molecular conformations it is necessary to examine 
how the distribution of the DSE changes as [C] changes. This requires an examination of 
the validity of the second, more tenuous assumption in the TM, according to which the 
denatured ensemble surface area exposures of the backbone and side chains do not change as 
[C] changes. This assumption, whose validity has not been examined until the present work, 
implies that neither the polymeric nature of proteins, the presence of residual structure in 
the DSE, nor the extent of protein collapse alters (a^^([C])) or («feiAf([^])) significantly. 
Our work shows that as urea concentration (or more generally any denaturant) changes 
there are substantial changes in P^ar) (Fig. [3]d), Rg, and Ree (Fig. [9]). However, because 
backbone groups, whose a^o values are more narrowly distributed than almost all other 
groups (see Fig. H^), make the dominant contribution to the m- value (see Fig. |3)d), the 
m-value is constant in the transition region. Therefore, approximating Eq. [2] using Eq. [1] 
causes only small errors in the range of 3 M to 8 M urea for protein L. 

The utility of the TM in yielding accurate values of m using measured transfer free 
energies of isolated groups, without taking the polymer nature of proteins into account, has 



been established in a series of papers 15|, ll6|. The success of the empirical TM (Eq. [T]), with 



its obvious limitations, has been rationalized |15l . |16[ | by noting that the backbone makes 
the dominant contribution to m. The present work expands further on this perspective 
by explicitly showing that the total backbone surface changes (Aa^) area changes weakly 
with [C] (for [C] > 3M for protein L). We conclude that Eq. [T], with the assumption that 
changes in surface areas are approximately [C]-independent, is reasonable. This finding, to 
our knowledge, has not been demonstrated previously. We ought to emphasize that m, a 
single parameter, is only a global descriptor of the properties of a protein at [C] ^ 0. Full 
characterization of the DSE requires calculation of changes in the distribution functions of a 
number of quantities (see Figs. [3^ and[3)D) as a function of [C]. This can only be accomplished 
using MTM-like simulations and/or NMR experiments, which are by no means routine. The 
paucity of NMR studies that have characterized [C]-dependent changes in the DSE, at the 
residue level, shows the difficulty in performing such experiments. 

The MTM simulations show discernible deviations from linear behavior at [C] < 3 M 
(Fig. Wp), which can be traced to changes in the backbone surface area in the DSE. The 
structural characteristics of the unfolded state under such native conditions are different 
from those at [C] >> [Cm]. The values of Aaf^, are relatively fiat when [C] > [Cm] (Fig. 
I2b) but decrease below [Cm] because of protein collapse. Because 6gl^{[C]) dominates even 
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below [Cm] (Fig. |Dd) it follows that departure from linearity in AGnd{[C]) is largely due to 
burial of the protein backbone. The often-observed drift in baselines of spectroscopic probes 
of protein folding may well be indicative of the changes in Aa^^, and reflect the changing 



distribution of unfolded states 



24, 



25 



27 



29j, that 



59| . Single molecule experiments 
directly probe changes in the DSE even below [Cm], exhibit large shifts in the distribution 
of FRET efficiencies with [C]. Our simulations are consistent with these observations. The 
logical interpretation is that the DSE and, in particular, the distribution of ax, «_b, and the 
radius of gyration Rg must be [C]-dependent. The present simulations suggest that only by 
carefully probing these distributions, can the replacement of Eq. [2]by Eq. [T]be quantitatively 
justified. In particular, large changes in the DSE occur under native conditions. Therefore, 
it is important to characterize the DSE under native conditions to monitor the collapse of 
proteins. 

Equilibrium SAXS experiments on protein L at various guanidinium chloride concen- 
trations found that Rg does not change significantly above [Cm] 6o|. The ~2 A change 
in R^ above [Cm] observed in these simulations is within the ^ ±1.8 A error bars of the 
experimentally measured Rg above [Cm] [60]. Our findings also suggest that the largest 
change in R^ occurs well below [Cm] (3 M urea or less). Under these conditions the 
fraction of unfolded molecules is less than 1% (Fig. [T)d inset), which implies it is difficult 
to accurately measure the Rg of the DSE using current SAXS experiments and explains 
why the equilibrium collapse transitions are not readily observed in scattering experiments. 
The present work and increasing evidence from single molecule FRET experiments show 
that the denatured state can undergo a continuous collapse transition that is modulated 
by changing solution conditions. This finding underscores the importance of quantitatively 
characterizing the DSE in order to describe the folding reaction. In order to establish if the 
collapse transition is second order, which is most likely the case, will require tests similar 
to that proposed by Pappu and coworkers (611] . 
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TABLE I: van der Waals radius of the side chain beads for various amino-acids based in part on measured partial molar volumes 



62| 



Residue Radius (A) 



Ala 


2.14 


Cys 


2.33 


Asp 


2.37 


Glu 


2.52 


Phe 


2.70 


Gly 


2.70 


Hsd-^ 


2.63 


He 


2.63 


Lys 


2.70 


Leu 


2.63 


Met 


2.63 


Asn 


2.33 


Pro 


2.36 


Gin 


2.56 


Arg 


2.79 


Ser 


2.20 


Thr 


2.39 


Val 


2.49 


Trp 


2.88 


Tyr 


2.75 



"The same value of the radius was used regardless of the protonation state. 



TABLE II: Solvent accessibility of the backbone and side chain groups of residue k in the tripeptide 

Gly -k- Gly (aaiy-k-Gly) 





OlGly-k-Gly 


(A2) 


k 


Backbone 


Side chain 


Ala 


62.5 


108.3 


Met 


50.3 


164.7 


Arg 


46.2 


186.0 


Gin 


52.1 


155.4 


Asn 


55.6 


138.7 


Gly 


85.0 


0.0 


Tyr 


47.3 


179.9 


Asp 


56.7 


133.7 


Trp 


43.8 


198.7 


Phe 


48.3 


174.6 


Cys 


57.7 


128.6 


Pro 


56.9 


132.7 


Lys 


48.3 


174.6 


Hsd'' 


51.4 


159.2 


Hse 


51.6 


159.2 


Hsp 


51.4 


159.2 


Ser 


60.9 


114.9 


Thr 


56.2 


135.7 


Val 


53.8 


147.1 


He 


50.3 


164.7 


Glu 


53.0 


150.8 


Leu 


50.3 


164.6 



"Hsd - Neutral histidine, proton on NDl atom. Hse - Neutral histidine, proton on NE2 atom. HSP - 
Protonated histidine. 
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Figure Captions 



Figure [T] (a) The transfer free energy of the backbone (the glycine residue) and 
side chain groups as a function of urea concentration. The hues are a hnear extrapolation 
of the experimentally measured Sgk upon transfer from M to 1 M urea ISj- The 
amino acid corresponding to a given line is labeled using a three letter abbreviation. 
Blue labels are for hydrophobic side chains, while red labels indicate polar or charged 
side chains according to the hydrophobicity scale in {gsI. (b) The native state sta- 
bility (black circles) of protein L as a function of urea concentration, [C], at 328 K. 
AGnd{[C]) = -kBTln{PN{[C])/{l - Pn{[C]))), where Pn{[C]) is the probability of being 
folded as a function of [C]. The midpoint of the transition Cm = 6.56 M urea. The red line 
is a linear fit to the data in the range of 5.1 to 7.9 M. At [C] < 3 M there is a departure 
from linearity (i.e. a [C]-dependent m- value). Inset in the upper left is a ribbon diagram 
of the crystal structure of protein L 48|]. Inset in bottom right shows Pn{[C]) versus [C] 
at 328 K (blue line). In addition, \dPN/d[C]\, the absolute value of the derivative of P/v 
versus [C] is shown (green line). The full width at half the maximum value of \dPN/d[C]\ 
(denoted 26C) is 2.8 M and is defined as the 'transition region' given by Cm ± SC. 

Figure [2J (a) (af j) versus urea concentration for the backbone and the side chains ala- 
nine, phenylalanine, and glutamate, computed using {cikjilC])) = c^kjPi^kjj [C])dak^s 
(j = D or iV and P = 5 or B). For the backbone {ataiijiiC])) = N'^ Zk=iKji[C])), 
where N = 64, the number of residues in the protein, (aj^j^f) and (af £>) are displayed as 
green and blue lines respectively. Brown dashed lines show (af £,) for individual residues 
of type k, the residue indices are indicated by the numbers in red. For the backbone only 
six groups (from residues 1, 10, 20, 30, 40, and 50) out of sixty-four backbone groups are 
shown, (b) Linear secondary structure representation of protein L. /3-strands are shown as 
red arrows, the a-helix as a green cylinder, and unstructured regions as a solid black line. 
Secondary structure assignments were made using the STRIDE program 6J]. The residues 
corresponding to each secondary structure element are listed below the representation, (c) 
nfcAa^ (Eq|2]) as a function of urea concentration for the backbone (green line, with cor- 
responding ordinate on right), and all other sixteen unique amino acid types in protein L 
(with corresponding ordinate on left). For clarity, labels for Met and Ser residues are not 
shown. Met and Ser have n^Aaf values close to zero in this graph. Aaf = {0-^,0) ~ {^k,N) 
{P = S or B). For the backbone we plot Xl^i^fc'^'^f • The inset shows Act-r as a function 
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of urea concentration. The red arrow indicates Cm- 
Figure |3l (a) The probabihty distribution of the total backbone surface area in the DSE 
{P{a^)) at various urea concentrations, indicated by the number above each trace. For 
comparison, P(a^) for the native state ensemble at 6.5 M urea is shown (solid brown line) 
as well as the average distribution over both the NSE and DSE at 6.5 M urea (black line), 
(b) Same as (a) except distributions are of the accessible surface area of the entire protein. 

Figure ID (a) The ratio pk = o'ak d/ {^k,D) (see text for explanation) as a function of 
urea concentration for the entire protein (black line), the backbone (blue line), and all other 
amino acid types found in protein L. (b) The quantity m[C] versus urea concentration for 
the full protein (black circles), the backbone groups (red squares), and the Phe, Leu, He, 
and Ala side chains. Solid lines correspond to linear fits to the data in the range of 5.1 to 
7.9 M urea. 

Figure [5j The distribution (-P(a^;„ d)) of the solvent accesible surface area of side chains 
from the nine individual alanine residues in the denatured state ensemble of protein L at 
various urea concentrations. Black, red and green lines correspond to 1 M, 4 M and 8 M 
urea respectively. The corresponding alanine for each graph is given by its residue number. 
The large changes in (-P(a^ia u)) chemically identical residue shows that environment 

and local structures affect the structures and energetics of the side chains. 

Figure |HJ The ratio cuf^D/cn^j^ (see text for an explanation) as a function of residue 
number z at 8 M urea. The legend indicates the amino acid type for each residue. Only 
amino acid types that occur at least four times in protein L, and have at least two of those 
residues separated by more than twenty five residues along sequence space, are plotted. For 
reference, the linear secondary structure representation of protein L is shown above the 
graph. 

Figure [71 (a) The residual secondary structure content in the DSE versus urea concen- 
tration, (b) The interaction energy {Em) in the DSE divided by the number of residues in 
the secondary structural element, in units of ksT, versus urea concentration for the entire 
protein and various secondary structural elements. The inset shows Ej, Es, and Em for the 
entire protein versus urea concentration in units of kcal mol~^. 

Figure [HI The histogram of residue-dependent midpoints of unfolding as a function of 
(a) urea concentration at 328 K and (b) temperature at M urea. The Cm for the entire 
protein is ~6.6 M, while the melting temperature is 356 K at M urea. 

Figure [H) The average Rg (open black circles) and i?ee (x's) as a function of [C] for 
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protein L at 328 K. The values of R^^^ (open black circles, dashed line, left axis) and R^f^ 
(x's, dashed line, right axis) as a function of urea concentration are also shown. Lines are a 
guide to the eye. The gray vertical line at 6.56 M urea denotes the Cm- 
Figure do) Table of contents graphic. 
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